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Abstract 

Background: Fall events contribute significantly to mortality, morbidity and costs in our ageing population. In 
order to identify persons at risk and to target preventive measures, many scores and assessment tools have been 
developed. These often require expertise and are costly to implement. Recent research investigates the use of 
wearable inertial sensors to provide objective data on motion features which can be used to assess individual fall 
risk automatically. So far it is unknown how well this new method performs in comparison with conventional fall 
risk assessment tools. The aim of our research is to compare the predictive performance of our new sensor-based 
method with conventional and established methods, based on prospective data. 

Methods: In a first study phase, 119 inpatients of a geriatric clinic took part in motion measurements using a 
wireless triaxial accelerometer during a Timed Up&Go (TUG) test and a 20 m walk. Furthermore, the St. Thomas 
Risk Assessment Tool in Falling Elderly Inpatients (STRATIFY) was performed, and the multidisciplinary geriatric care 
team estimated the patients' fall risk. In a second follow-up phase of the study, 46 of the participants were 
interviewed after one year, including a fall and activity assessment. The predictive performances of the TUG, the 
STRATIFY and team scores are compared. Furthermore, two automatically induced logistic regression models based 
on conventional clinical and assessment data (CONV) as well as sensor data (SENSOR) are matched. 
Results: Among the risk assessment scores, the geriatric team score (sensitivity 56%, specificity 80%) outperforms 
STRATIFY and TUG. The induced logistic regression models CONV and SENSOR achieve similar performance values 
(sensitivity 68%/58%, specificity 74%/78%, AUC 0.74/0.72, +LR 2.64/2.61). Both models are able to identify more persons 
at risk than the simple scores. 

Conclusions: Sensor-based objective measurements of motion parameters in geriatric patients can be used to 
assess individual fall risk, and our prediction model's performance matches that of a model based on conventional 
clinical and assessment data. Sensor-based measurements using a small wearable device may contribute significant 
information to conventional methods and are feasible in an unsupervised setting. More prospective research is 
needed to assess the cost-benefit relation of our approach. 



Background 

It is well-known that fall events constitute an important 
factor with regard to mortality, morbidity and costs in 
our aging population. These events have a high incidence 
especially in the elderly: 25.1% of the men and 37% of the 
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women aged 65 years and above fall at least once within 
12 months [1]. The highest incidence is reported for ger- 
iatric inpatients [1], which often have several risk factors 
[2] at the same time and suffer from multiple diseases: 
The prevalence rate of five or more somatic diseases for 
persons aged 70 years and above has been reported in 
the Berlin Aging Study to be 88% [3], As fall events and 
their consequences are very costly - an estimated annual 
19.2$ billion in the U.S. [4] - preventive measures have 
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been investigated intensively [5]. These measures them- 
selves are costly, so that two predominant questions are: 
Who should be treated in the first place, and who should 
receive which kind of preventive measure? 

In order to identify persons at risk to fall down - thus 
being eligible for preventive treatment - many risk 
assessment tools, e.g. the Timed Up&Go test (TUG) [6] 
or the St. Thomas Risk Assessment Tool in Falling 
Elderly Inpatients (STRATIFY) [7] have been developed 
and evaluated in a multitude of studies. Comprehensive 
reviews can be found e.g. in [2,8,9]. Several tests have 
also been used to predict falls in outpatients, often with 
a specific group of patients. Kikuchi et al. report that, in 
a prospective study with 79 patients having a diagnosis 
of cognitive impairment and lasting 12 months, only 
their fall-predicting score, a self-answered 21-item ques- 
tionnaire, was predictive of future falls [10], but not e.g. 
the TUG. The latter test was, in contrast, found as the 
only predictive parameter for falls in patients after hip 
surgery in a 6-month prospective study by Kristensen 
et al. [11] Hale et al. found that mobility scores were 
not associated with falls in a 12-month prospective 
study with 120 geriatric outpatients, but history of falls 
was [12]. Oliver et al. conclude that even the best tools 
are not able to identify the majority of fallers [9]. Keep- 
ing this in mind along with the often time-consuming 
nature of fall risk assessment tests (e.g. the Perfor- 
mance-Oriented Mobility Assessment, POMA [13]) that 
frequently require expert knowledge, several research 
groups have developed the idea to perform a sensor- 
based automatic or semi-automatic assessment using 
wearable inertial sensors [14-17]. Apart from offering 
continuous and objective data, this approach may also 
serve to detect fall events once they have happened, 
being aware of the fact that many falls go by undetected 
and a person may lie injured hours or even days in her 
or his flat. Despite promising first results of this sensor- 
based approach developed by the authors [18], it 
remains unclear how well the new methods perform in 
comparison with conventional fall risk assessment tools. 

Therefore, the aim of our research work for this paper 
is to examine the predictive performance of our new 
sensor-based method for fall risk assessment in compar- 
ison with conventional and established methods. The 
comparison is based on one-year follow-up data 
obtained in a prospective study. 

Methods 

General approach 

We recruited a sample of geriatric patients - who are 
known to have the highest risk of falling [1] - and per- 
formed selected conventional assessment tests which are 
used to determine fall risk. Furthermore, fall risk was 
assessed by the interdisciplinary geriatric care team 



(physicians, nurses, physiotherapists, occupational thera- 
pists) of the Department for Geriatric Medicine at 
Braunschweig Medical Center in Germany. In addition to 
these measures, a sensor-based assessment of fall risk was 
performed, which employs gait and motion parameters 
obtained during a Timed Up&Go test and a 20 m walk 
during the patients' hospital stay. These parameters are: 
kinetic energy, pelvic sway along the transversal axis, stan- 
dard deviation of gait periodicity, mean step duration, step 
length, number of steps during Timed Up & Go test and a 
number of spectral density distribution parameters such as 
the frequency of the most prominent spectral density 
peak. The sensor equipment used, the methods for para- 
meter extraction from raw data and the generation of pre- 
diction models are explained in detail in [18,19]. The 
study protocol has been approved by the Hanover Medical 
School ethics committee. 

Study population 

The study population consisted of geriatric inpatients of the 
Braunschweig Medical Center's Department for Geriatric 
Medicine who met the following inclusion criteria: 

♦ admission between April 24 th and October 18 th , 
2007 

♦ ability to stand up and walk 

♦ written consent (also by a third party) to take part 
in the follow-up interviews 

Although 119 patients took part in the first phase of the 
study, including the geriatric fall risk assessment tests and 
the sensor measurements, only 50 were able or willing to 
participate in the follow-up investigation (37 women and 
13 men). The reasons for drop-outs were: death (n = 17), 
untraceable (n = 13), no answer to the letter asking for 
written consent (n = 27), rejecting the telephone interview 
despite previous consent (n = 4), and several other pro- 
blems such as deafness or cognitive impairment (n = 8) 
[18]. Four of the motion sensor data sets were corrupted, 
so that altogether 46 persons (mean age 81.3 years) could 
be included in the fall risk assessment performance com- 
parison study. With regard to the high number of drop- 
outs we compared the group of participants with the 
drop-outs in respect of relevant clinical parameters and 
fall risk assessments (age, body mass index, Timed 
Up&Go test, geriatric team risk score as described below, 
STRATIFY score and Barthel index score). There were no 
statistically significant differences between the two groups, 
except for a higher STRATIFY score in the drop-out- 
group (2.8 vs. 2.3, p = 0.036). As mental alteration is one 
of the STRATIFY items, this result may indicate that per- 
sons with cognitive impairments were more likely to 
decline to give written consent or even to die. In a further 
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analysis of STRATIFY sub-items, however, no significant 
group differences could be identified. 

Conventional geriatric fall risk assessment tests and 
additional clinical parameters 

Several clinical parameters as well as geriatric assess- 
ment tests were performed by clinical staff members on 
admission to the geriatric department. The clinical para- 
meters were age, sex and body mass index (BMI). 
Furthermore, the St. Thomas Risk Assessment Tool in 
Falling Elderly Inpatients (STRATIFY) score [7] and the 
Barthel index [20] were assessed, and the Timed Up&Go 
test [6] was performed plus an additional 20 m walkway. 

Apart from these standardized methods, the multidis- 
ciplinary geriatric care team assessed individual fall risk 
using a self-developed rating scale, which simply con- 
sists of three values: no risk, low risk and high risk. 

In addition to the above-mentioned assessments which 
are performed during hospital stays, in the second phase 
of the study - conducted one year after the first phase - we 
used telephone interviews in order to assess fall events 
within the last year as well as general physical activity 
levels. A detailed account of physical activity levels among 
fallers and non-fallers can be found in [21]. The fall- 
related interviews were structured in accordance with the 
Prevention of Falls Network Europe (ProFaNE) consensus 
criteria [22] and are described in more detail in [18]. 

Sensor equipment 

During the inpatient phase of the study, all participants 
wore a wireless triaxial accelerometer system (Freescale 
RD31S2MMA7260Q) on a belt around the waist (Figure 1) 
during the Timed Up&Go test and the 20 m walkway in 
the physiotherapy department of the clinic. The data were 
transmitted to a PC and stored for processing. 

Classification model induction and evaluation 

In order to evaluate the predictive performance of the sev- 
eral fall risk assessment tools, we chose a two-fold 
approach. First of all, we used the two dedicated fall risk 
assessment scales - the STRATIFY score and the Timed 
Up&Go test - along with the care team score to distin- 
guish between fallers and non-fallers within the year of fol- 
low-up. The cut-off points were set as proposed by the 
developers in their original publications: > 2 points for the 
STRATIFY score [7], > 20s for the Timed Up&Go test [6] 
and no risk vs. low/high risk for the team score. For each 
score, we constructed a contingency table and calculated 
sensitivity (SENS), specificity (SPEC), positive (PPV) and 
negative predictive values (NPV) as well as overall classifi- 
cation accuracy (ACC). 

In a second step, we used all clinical and fall risk 
assessment parameters to automatically induce a classifi- 
cation model (model CONV). The same has been done 




Figure 1 Triaxial accelerometer sensor {Freescale 
RD3152MMA7260Q), sensor casing and belt No skin contact is 
necessary for the sensor function 



previously for the parameters that were extracted from 
sensor data and overall activity (model SENSOR) [18]. 
We chose to use a logistic regression model based on its 
known properties of stability and performance in small 
data sets and employed the Open Source toolkit Waikato 
Environment for Knowledge Analysis (WEKA, version 
3.7.1, simple logistic algorithm; parameters: error on 
probabilities = true, heuristic stop = 50, maximum num- 
ber of iterations for LogitBoost = 5000, use cross valida- 
tion = true, no weight trimming) [23]. For model 
induction, the binary attribute fall within the last year 
(yes/no) was used. The multi-parameter data sets were 
pre-processed using a feature selection algorithm in 
order to exclude parameters with low information 
(WEKA, version 3.7.1, wrapper subset evaluator, employ- 
ing the simple logistic regression algorithm as described 
above). All automatically induced classification models 
were evaluated using a ten times ten-fold cross-validation 
procedure, and SENS, SPEC, PPV, NPV, ACC and the 
area under the curve (AUC) were calculated. Further- 
more, we computed the each model's Brier score as a 
performance measure [24], which is the mean squared 
difference between the observed outcome and the pre- 
dicted probability for each instance in the data set. 
Finally, the positive likelihood ratios (+LR) of all models 
were calculated as an additional performance measure 
that spans the whole of each contingency table. 

Results 

Tables 1, 2 and 3 show the results of the single fall risk 
assessment tests for the prediction of actual fall events 
within a year after discharge from the geriatric ward. In 
Table 4, all +LR values are presented. The STRATIFY 
score (Table 1), a dedicated fall risk tool, has an overall 
classification accuracy of 48% with a good sensitivity of 
79% but a low specificity of 26%. While the NPV is 63%, 
the PPV is only 43%, meaning that a positive assessment 
result does not predict actual falls well. 
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Table 1 Classification results and contingency table for 



the STRATIFY score [7] (cut-off point > 2 points) 



STRATIFY score 


contingency table 


classification accuracy 


48% 


fail within 


one year 


sensitivity 


79% 


yes 


no Sum 


specificity 


26% pred. yes 


15 


20 35 


negative predictive value 


63% pred. no 


4 


7 11 


positive predictive value 


43% sum 


19 


27 46 



The Timed Up&Go test results in Table 2 show an 
overall classification accuracy of 50%, where a high sen- 
sitivity of 90% is pitted against a very low specificity of 
22%. Similar to the STRATIFY score results, the NPV is 
slightly higher (75%) than the PPV (45%). 

The geriatric team's fall risk assessment score 
(Table 3) shows more balanced, though not really 
much better results: A classification accuracy of 55% 
is accompanied by a sensitivity of 63% and a specifi- 
city of 50%. The NPV is 68% and the PPV is 44%. The 
+ LR values (Table 4) of all three simple fall risk 
assessments (1.07, 1.15 and 1.25) confirm their low 
predictive power, yet among these the team score has 
the highest hit ratio. 

The automatically generated classification model 
CONV (Table 5) based on clinical and geriatric assess- 
ment data shows markedly better performance values 
than the three previous tests: The classification accuracy 
is 72% with a sensitivity of 68% and a specificity of 74%. 
Furthermore, both the NPV (77%) and the PPV (65%) 
are balanced and on a fair level. The overall good per- 
formance of this model is also shown by both the Brier 
score of 0.2, an AUC of 0.74 and a statistically signifi- 
cant +LR value of 2.64 (Table 4). 

The classification model SENSOR (Table 6, [18]) 
matches the CONV model in its measures: Classifica- 
tion accuracy is 70%, with a sensitivity of 58% and a 
specificity of 78%. NPV (72%) and PPV (65%) are also 
level. The +LR value of 2.61, however, does not reach 
statistical significance due to the broader confidence 
interval. 

Table 2 Classification results and contingency table for 
the Timed Up&Go test [6] (cut-off point > 20s) 



Timed Up&Go test 



contingency table 


classification accuracy 5C 


% 


fall within 


one year 


sensitivity 9C 


% 


yes 


no Sum 


specificity 22 


% pred. yes 


17 


21 38 


negative predictive value 75 


% pred. no 


2 


6 8 


positive predictive value 45 


% sum 


19 


27 46 



Table 3 Classification results and contingency table for 
multidisciplinary geriatric team fall risk score (4 missing 
values) 



TEAM assessment 


contingency table 


classification accuracy 


55% 




fall within 


one year 




sensitivity 


63% 




yes 


no 


Sum 


specificity 


50% 


pred. yes 


10 


13 


23 


negative predictive value 


68% 


pred. no 


6 


13 


19 


positive predictive value 


44% 


sum 


16 


26 


42 


Discussion 












The performances 


of the 


simple 


fall risk 


assessment 



tools used in this study - the STRATIFY score, the 
Timed Up&Go (TUG) test and the geriatric care team 
rating - are limited. In a recent meta-analysis Oliver et 
al., who have developed the STRATIFY score, report the 
following values for geriatric patients: SENS 67.2%, 
SPEC 51.2%, PPV 23.1% and NPV 86.5% (n = 1285 
patients, four different studies) [9]. Kim et al. have also 
evaluated the STRATIFY score, albeit with a much 
younger cohort (mean age 56 years, n = 5489 patients, 
60 fallers), and find: SENS 55%, SPEC 75.3%, PPV 2.4% 
and NPV 99.3% [25]. Our results show a slightly worse 
performance than was reported in the meta-analysis by 
Oliver et al. [9] . This may well be due to our very small 
sample size. The same applies to the Timed Up&Go 
test. Nordin et al. have studied the predictive validity of 
the TUG in 183 patients with a mean age of 84.3 years 
and a cut-off point of 20s [26]. They report a sensitivity 
of 79% and a specificity of 32%. In the large Tromso 
study Thrane et al. find sensitivity values of 44-14% and 
specificity values of 58-90% for the TUG, depending on 
the choice of the cut-off points (here between 12 and 
17s) [27]. Kristensen et al. report SENS 95%, SPEC 35%, 
PPV 41%, NPV 93%, +LR 1.5, -LR 0.1 for the TUG's 
predictive performance for patients after hip surgery 
(mean age 81 years, n = 59 patients, 19 fallers, cut-off 
value 24s) [11]. Our results (Table 2) also show a 
remarkable sensitivity of 90% for the TUG, yet the spe- 
cificity is way too low for a screening test. This is con- 
firmed by the low +LR value of 1.15. 



Table 4 +LR values of all five classification models 



including the confidence intervals 



model name 


+LR value 


95% confidence interval 


STRATIFY score 


1.07 


0.71-1.61 


Timed Up&Go test 


1.15 


0.83-1.59 


Team Assessment 


1.25 


0.63-2.49 


model CONV 


2.64 


1.07-6.5 



model SENSOR 2.61 0.94-7.26 
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Table 5 Classification results and contingency table for 
logistic regression model based on clinical data and fall 
risk assessment tests 



model CONV 


classification accuracy 72% 


sensitivity 


68% 




contingency table 




specificity 


74% 




fall within one year 




negative predictive value 


77% 




yes no 


Sum 


positive predictive value 


65% 


pred. yes 


13 7 


20 


Brier score 


0.20 


pred. no 


6 20 


26 


AUC 


0.74 


sum 


19 27 


46 



Both tests are very simple to perform, either by history 
taking or by conducting a simple physical test, and both 
take only a couple of minutes. Therefore, these tests may 
serve well - and in fact are frequently used - as general 
screening methods, if necessary followed by more com- 
plex, multimodal assessment inventories such as the Phy- 
siological Profile Assessment (PPA) [28]. 

The geriatric care team fall risk score may be perceived 
as a very subjective measure, yet it represents the profes- 
sional opinion of several experienced experts that is very 
likely based on an intuitive understanding of the complex 
concept 'fall risk' as well as on a multitude of observa- 
tions of a certain patient. This solid foundation is 
reflected by the fair performance values of this score, 
which are the most balanced of the three simple tests. 
Similar results have been found in [26], where 'global rat- 
ing of fall risk' (low/high) by staff members achieved a 
sensitivity of 56% and a specificity of 80%. 

The automatically induced model CONV (Table 5) 
shows better performance values than all of the above 
tests. This is of course due to the approach of including 
basic clinical data such as sex, BMI and age in the induc- 
tion process, but also to the combination of different 
assessment methods ranging from a physical test (TUG) 
over a measure of daily activity capability (Barthel index) 
to a fall risk score (STRATIFY). In the induction process, 
the most relevant parameters or scores are identified and 

Table 6 Classification results and contingency table for 
logistic regression model based on sensor data and long- 



term physical activity level 


model SENSOR 


classification accuracy 70% 


sensitivity 


58% 


contingency table 




specificity 


78% 


fall within one year 




negative predictive value 


72% 


yes no 


Sum 


positive predictive value 


65% pred. yes 


11 6 


17 


Brier score 


0.21 pred. no 


8 21 


29 


AUC 


0.72 sum 


19 27 


46 



included, so that performance is optimized. The multi- 
tude of candidate parameters may capture the multi- 
factorial concept of fall risk more adequately than a sin- 
gle test. The performance measures show that CONV 
can identify most of the fallers and non-fallers correctly, 
based on their one-year outcome. Thus, this model could 
be suitable as a screening test for geriatric patients, facili- 
tating the prescription of preventive measures. 

When compared to the previously computed SENSOR 
model, which is based merely on accelerometer sensor 
data and overall activity levels, we can state that this 
model performs almost equally well than the CONV 
model. The Brier scores (0.21 vs. 0.2) are nearly the 
same, as are the AUC (0.74 vs. 0.72) and +LR values 
(2.64 vs. 2.61). Therefore, regarding our preliminary 
results we may conclude that by using sensor data - 
which may be recorded over extended periods of time 
during normal daily activities with a small and unobtru- 
sive device - we can match the performance of conven- 
tional methods with regard to fall risk assessment in a 
sample of geriatric patients. The advantage of our 
approach is of course the absent necessity of an expert 
physiotherapist, nurse or physician to perform the assess- 
ment. This could be done by the wearable device itself, 
using long-term motion data along with the developed 
algorithm. Potential drawbacks of our approach are pos- 
sible technical failures such as data loss from the acceler- 
ometer device, acceptance issues, limited battery lifetime 
and the current lack of technical infrastructures (e.g. sen- 
sor-enhanced health information systems [29,30]). Tech- 
nical equipment and infrastructures are of course costly, 
but so are falls and their consequences in the first place. 

Future prospective studies will have to be conducted 
with more patients and over an even longer period of 
time, evaluating the validity of our approach and our 
preliminary results in an independent patient sample on 
the one hand and the cost-benefit relation on the other 
hand. From an ethics point of view, however, one might 
argue that every fall that is avoided is a big benefit for 
the individual. 

From a technical point of view, more research work is 
needed to look into the potential predictive parameters 
that can be extracted from sensor data [31]. Further- 
more, in this study we have not considered any informa- 
tion from the patients' electronic health records (EHRs) 
[32], such as diagnoses or additional history. Consider- 
ing the multi-factorial aetiology of falls [33], our sensor- 
based information may well be used in combination 
with resp. as supplement to conventional geriatric 
assessment tools and other clinical data. 

Limitations 

Our sample size is small and this - in comparison with 
large trials evaluating the conventional methods-limits 



Marschollek et al. BMC Medical Informatics and Decision Making 201 1, 11:48 
http://www.biomedcentral.eom/1 472-6947/1 1 /48 



Page 6 of 7 



the generalizability of our results. So does the fact that, 
due to the sample size, we cannot use separate training 
and test data sets for model induction. Nevertheless, we 
have chosen a well-established procedure to avoid over- 
fitting of our models, namely ten times, ten-fold cross- 
validation. The necessity for written consent to be 
returned by the patients via surface mail may have led to 
the exclusion of persons with cognitive impairments, 
even though consent by a third party was an option. 
Furthermore, in our follow-up study telephone interviews 
were used to identify fall events. This approach is error- 
prone, as many factors (e.g. cognitive impairment) may 
affect the recall of such events. In addition to this, con- 
sidering the group of patients, within a period of twelve 
months risk factors may have changed significantly. 
Therefore, daily recordings as well as more frequent 
interviews, e.g. on a monthly basis as recommended in 
ProFaNE consensus criteria recommendation no. 7 [22], 
might have reduced the error rate, but have not been per- 
formed due to a lack of resources. Hence, in our future 
prospective studies, we will include a more stringent 
monitoring approach. 

From an economic perspective, it remains unclear if the 
prediction results are good enough to justify the imple- 
mentation of costly preventive measures for the false 
positives [5]. A cost-benefit analysis should be conducted, 
comparing direct and indirect costs of fall events with 
those of preventive measures. Furthermore, despite pro- 
mising preliminary studies (e.g. [34-36]), the patients' 
acceptance of long-term monitoring should be assessed, 
e.g. using the Sensor Acceptance Model [37]. 

Conclusions 

To the authors' knowledge, this is the first study to com- 
pare sensor-based fall risk assessment to conventional 
assessment tools in a prospective long-term setting. Our 
preliminary results indicate that a fall risk model based on 
accelerometer sensor data performs almost as well as a 
model that is derived from conventional geriatric assess- 
ment data. Therefore we may conclude that such a model 
can provide relevant information and thus - considering 
the multi-factorial aetiology of fall events - could be valu- 
able not as a replacement of professional assessment 
scores and tools, but as a supplementary method which is 
feasible to be used outside of a supervised clinical 
environment. 
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